-
Notifications
You must be signed in to change notification settings - Fork 149
Fix build due to phasing off SecurityManager usage in favor of Java Agent #2657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d434ae4
to
8020351
Compare
OK, so the arbitrary FS access like |
So all Windows and MacOS builds are green, but on Linux, the attempt to escape the sandbox is failing |
Are there sandboxing rules somewhere? I believe we looked into oshi, but this also relies on proc file system (https://github.com/oshi/oshi/blob/a38af2abed180de0246e7faeb057270d4e0716fb/oshi-core/src/main/java/oshi/util/platform/linux/ProcPath.java#L13). Not sure on other alternatives - maybe we could take a hint from java process args. I think its just here for simd on the hardware (
|
@jmazanec15 we can identify the Something like below?
|
It is in security configuration - OS_PATH_CONF is readable to every plugin (we inherited this behavior from ES)
Looking for options as well |
Maybe so, but is calling another process better than reading from proc file system? |
@reta , per @cwperks suggest, can we add |
It is no better. You still violate the sandboxing boundaries. It only helps to solve the current build blocker. When we start intercepting |
@jmazanec15 the plugins should not be allowed to access anything outside the config folders, the k-NN should not be an exception I believe |
Ok, if that is enforced now and causing failure, it seems we should just refactor the PlatformUtils in knn to be a class in core like FsProbe (https://github.com/opensearch-project/OpenSearch/blob/4d0ac04caa85567a8a5fb3c850e41a1349624587/server/src/main/java/org/opensearch/monitor/fs/FsProbe.java) which calls "/proc/diskstats" and take a dependency on that from the plugin. Alternatively, in short term to unblock release, we could just modify https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/resources/org/opensearch/bootstrap/security.policy#L206-L228 to grant access. |
}; | ||
|
||
grant codeBase "${codebase.opensearch}" { | ||
permission java.io.FilePermission "/proc/cpuinfo", "read"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also added these lines on cwperks#1 and the change appears to be taking effect on Windows and Mac, but not Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also added these lines on cwperks#1 and the change appears to be taking effect on Windows and Mac, but not Linux.
I believe we need support from OpenSearch Policy here, will be working on it tonight
@jmazanec15 it seems like we have a workaround for this issue, some checks are failing but seems to be unrelated? (I sadly cannot rerun them). |
e7c5fb9
to
5df0ea2
Compare
Thanks @reta ! Retrying! |
Sorry, need a few touches, the fix was fine but had side effects on tests, getting that fixed, thank you @jmazanec15 |
sure - np let me know if it needs a retry! Thanks so much! |
3d0ffe3
to
4bdfb5a
Compare
FYI @reta, @prudhvigodithi showed the new core plugin working in JS: opensearch-project/job-scheduler#762 |
Yep, I will surely accommodate it once the builds are green, thanks @cwperks |
@@ -174,6 +176,14 @@ public class KNNPlugin extends Plugin | |||
private ClusterService clusterService; | |||
private Supplier<RepositoriesService> repositoriesServiceSupplier; | |||
|
|||
static { | |||
ForkJoinPool.commonPool().execute(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, the reason this works is that now the call stack will only have a single ProtectionDomain because we are caching the result instead of the 4 from here. Since this is now being done statically on class load, that would mean there would only be k-NN and JDK classes on the call stack instead of this call stack (below) where the failure was occurring:
Full stack trace
2025-04-10T18:18:30.0745441Z » at org.opensearch.knn.jni.JNIService.initIndex(JNIService.java:50) ~[?:?]
2025-04-10T18:18:30.0747238Z » at org.opensearch.knn.index.codec.nativeindex.MemOptimizedNativeIndexBuildStrategy.lambda$buildAndWriteIndex$0(MemOptimizedNativeIndexBuildStrategy.java:63) ~[?:?]
2025-04-10T18:18:30.0748969Z » at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) ~[?:?]
2025-04-10T18:18:30.0750859Z » at org.opensearch.knn.index.codec.nativeindex.MemOptimizedNativeIndexBuildStrategy.buildAndWriteIndex(MemOptimizedNativeIndexBuildStrategy.java:62) ~[?:?]
2025-04-10T18:18:30.0753366Z » at org.opensearch.knn.index.codec.nativeindex.remote.RemoteIndexBuildStrategy.buildAndWriteIndex(RemoteIndexBuildStrategy.java:152) ~[?:?]
2025-04-10T18:18:30.0755853Z » at org.opensearch.knn.index.codec.nativeindex.NativeIndexWriter.buildAndWriteIndex(NativeIndexWriter.java:159) ~[?:?]
2025-04-10T18:18:30.0757378Z » at org.opensearch.knn.index.codec.nativeindex.NativeIndexWriter.flushIndex(NativeIndexWriter.java:105) ~[?:?]
2025-04-10T18:18:30.0759129Z » at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsWriter.flush(NativeEngines990KnnVectorsWriter.java:128) ~[?:?]
2025-04-10T18:18:30.0761709Z » at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.flush(PerFieldKnnVectorsFormat.java:120) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0764749Z » at org.apache.lucene.index.VectorValuesConsumer.flush(VectorValuesConsumer.java:76) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0767204Z » at org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:305) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0769225Z » at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:456) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0771702Z » at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:502) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0773880Z » at org.apache.lucene.index.DocumentsWriter.maybeFlush(DocumentsWriter.java:456) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0776468Z » at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:649) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0778717Z » at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:578) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0781115Z » at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:382) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0783673Z » at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:356) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0786557Z » at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:346) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0789243Z » at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0791886Z » at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0794375Z » at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:72) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0796979Z » at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:52) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0799472Z » at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0802154Z » at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0804640Z » at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:443) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0807389Z » at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:423) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0809781Z » at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0812430Z » at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0813833Z » at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1805) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0815048Z » at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1782) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0816612Z » at org.opensearch.index.shard.IndexShard.refresh(IndexShard.java:1421) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0818081Z » at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.lambda$shardOperationOnPrimary$0(TransportShardRefreshAction.java:101) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0819408Z » at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344) ~[opensearch-core-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0820754Z » at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnPrimary(TransportShardRefreshAction.java:100) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0822317Z » at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnPrimary(TransportShardRefreshAction.java:57) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0823870Z » at org.opensearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1333) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0826092Z » at org.opensearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:150) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0829035Z » at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:654) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0831286Z » at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:547) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0832579Z » at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0834269Z » at org.opensearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$36(IndexShard.java:4203) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0836729Z » at org.opensearch.core.action.ActionListener$3.onResponse(ActionListener.java:132) ~[opensearch-core-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0838980Z » at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:310) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0841585Z » at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:255) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0843925Z » at org.opensearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:4174) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0846913Z » at org.opensearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:1262) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0850013Z » at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:544) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0852885Z » at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0855836Z » at org.opensearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:483) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0858888Z » at org.opensearch.wlm.WorkloadManagementTransportInterceptor$RequestHandler.messageReceived(WorkloadManagementTransportInterceptor.java:63) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0861829Z » at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:108) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0864064Z » at org.opensearch.transport.TransportService$7.doRun(TransportService.java:1048) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0866590Z » at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:975) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0868936Z » at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0870628Z » at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
2025-04-10T18:18:30.0871981Z » at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
2025-04-10T18:18:30.0872999Z » at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2025-04-10T18:18:30.0875008Z » Caused by: java.lang.SecurityException: Denied OPEN (read) access to file: /proc/cpuinfo, domain: ProtectionDomain (file:/__w/k-NN/k-NN/build/testclusters/integTest-0/distro/3.0.0-ARCHIVE/lib/lucene-core-10.1.0.jar <no signer certificates>)
Note: JDK classes are filtered when extracting ProtectionDomains
With the failing call stack from above there are 4 unique ProtectionDomains:
- k-NN
- lucene-core
- opensearch
- opensearch-core
A ProtectionDomain is essentially analogous to a jar..its a little more than that, but OpenSearch has only exclusively tied it to a single "codeBase" (re: jar). (A ProtectionDomain could be mapped to multiple codebases if using the jarsigner, but my understanding is this feature of JSM was not used widely)
Since we are moving this here we are now only evaluating the permissions for k-NN instead of all of k-NN, lucene-core, opensearch and opensearch-core.
Limiting the stack walking would have the same effect, but as you pointed out its relying on classes marked for removal.
I'm leaving a comment here to make sure my understanding is correct and for others reading this, but there is nothing to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving a comment here to make sure my understanding is correct and for others reading this, but there is nothing to be addressed.
Correct, threads have own stacks which we traverse.
010fbf3
to
e1d787f
Compare
…gent Signed-off-by: Andriy Redko <[email protected]>
@jmazanec15 we should be all set, the failing check is intermittent timeout issue, much appreciate if you could retry, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks so much @reta !!!
…f Java Agent (opensearch-project#2657)" This reverts commit 4201fb0.
Description
Fix build due to phasing off SecurityManager usage in favor of Java Agent
Related Issues
See please https://github.com/opensearch-project/k-NN/actions/runs/14364233941/job/40358433955?pr=2652
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.